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Abstract 

The Poincare constant Ry of a random variable Y relates the 
L 2 (Y)-norm of a function g and its derivative g' . Since Ry — Var (Y) 
is positive, with equality if and only if Y is normal, it can be seen as a 
distance from the normal distribution. In this paper we establish the 
best possible rate of convergence of this distance in the Central Limit 
Theorem. Furthermore, we show that Ry is finite for discrete mix- 
tures of normals, allowing us to add rates to the proof of the Central 
Limit Theorem in the sense of relative entropy. 



1 Introduction and results 

Poincare (or spectral gap) inequalities provide a relationship between L 2 
norms on functions and their derivatives. 

Definition 1.1 (Borovkov and Utev) Given a random variable Y , define 
the Poincare constant Ry: 

Varfl(r) 

where Hi (Y) is the space of absolutely continuous functions on the real line 
such that Var g(Y) > and Eg'(y) 2 < oo. 
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Ry will not in general be finite, however it will be finite for the normal 
and other strongly unimodal distributions (see for example Klaasen (1985), 
Chernoff (1981), Chen (1982), Cacoullos (1982), Nash (1958), Borovkov and 
Utev (1984)). 

We will exploit various relationships between the Poincare constant and 
Fisher information: 

Definition 1.2 For a random variable Y with smooth density p, define the 
score function py(y) = p'(y)/p(y), o,nd Fisher information I(Y) = Ep{Y) 2 . 

Notice that for a given Y, if g is a local maximum of Var g(Y)/(Eg'(Y) 2 ) 
then for all functions h and small t: 



so multiplying out, < t 2 (R g Eh' 2 - Eh 2 ) + 2t(R g Eg'h' - Egh), which can 
only hold in an interval around zero if: 



Integration by parts implies therefore that g = —Rg{pyg' + g"), so local 
maxima correspond to eigenfunctions of the Laplacian D Y g = (pyQ 1 + g"), 
and the global maximum to the least strictly negative eigenvalue (hence the 
alternative name of spectral gap inequality). 

Example 1.3 The Poincare constant can be infinite. For example, consider 
the discrete random variable, where P(X = 1) = P(X = —1) = 1/2. Then, 
we can choose g such that g'(—l) = g'iX) = e, but g{— 1) = —1, g(l) = 1, 
so that Var g(X) = 1, but Eg'(X) 2 = e 2 . This argument will work for any 
discrete random variable, indeed any random variable whose support is not 
an interval. 

However, our first main result shows that discrete random variables per- 
turbed by small normals have a finite Poincare constant: 

Theorem 1.4 Consider X , a random variable with variance o 2 taking a 
finite number of values with probabilities pi,P2, ■ ■ - Pn respectively, and Z T an 



Var (g + th) Var (g) 
E{g' + th') 2 ~ Eg' 2 



= R, 



RgEg'h' = Egh. 



(1) 
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indepedent normal with mean zero variance r. Then Y T = X + Z T satisfies a 
Poincare inequality with constant 



Note that part 8 of Theorem 1.1 of Utev (1992) also shows that Ry T is 
finite. However, our bound has an explicit dependence on a 2 , and so has 
independent interest. 

In a paper by Johnson and Barron (2002), we show that finiteness of the 
Poincare constant gives an explicit rate of convergence of relative entropy 
distance in the Central Limit Theorem. This is a strong result, and implies 
convergence in L 1 . 

Theorems 2, 3 and 4 of Borovkov and Utev provide the following results: 
Lemma 1.5 For the constant Rx defined above: 
1- Rax+b = a 2 R x 

2. If X, Y are independent, then Rx+y < Rx + Ry 

3. Rx > Var (X), with equality if and only if X is normal 



4- If Rx is finite then Eexp(|X — KX\/12y/R x ) < 2, so X has moments 

of all orders. 

5. If i?x„/Var (X n ) — > 1, then Kw(X n ) — ► Kw(Z) ,where Z is normal, for 
any continuous w with \w(t)\ < exp(c|t|) ; for sufficiently small c. 

The first three properties are reminiscent of those of Fisher Information - a 
subadditive relation holds and the minimising case characterises the normal 
distribution. In analogy with the approach to the Central Limit Theorem 
developed by Brown (1982) and Barron (1986), we add an extra term into 
the subadditive relation Rx+y < Rx + Ry, which is sandwiched as conver- 
gence occurs. This gives us an answer to the question posed by Chen and 
Lou (1990), of identifying the limit of the Poincare constant in the Central 
Limit Theorem. This was also answered by Utev (1992), though without the 
explicit rate of convergence that we provide. 




Proof See Section 



□ 
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Theorem 1.6 Consider X\, X2, ■ ■ ■ IID, with R = Rx t and I = I{X) finite. 
Defining U n = (Xi+. . . X n ) / V no 2 , then there exists a constant C, depending 
only on I and R, such that Ru n — 1 < C/n. 

Proof See Section |. □ 



We can argue that this is the best possible rate, up to the choice of the 
constant. Considering g(x) = x 2 — 1, we know that Rx > (EX 4 — l)/4. 
Since EU% n - 1 = (Ef/ 4 - l)/2 > 0, if R Un - 1 = f(n)/n then f{2 k )/2 k = 
R Uk -l> (EX 4 - l)/2 k , so f(2 k ) > EX 4 - 1 and hence does not tend to 



y 2 k 

zero. 



Since we can perturb random variables by adding small normals to ensure 
that the Fisher information is finite, we use this to prove a strong form of 
the Central Limit Theorem. 

Theorem 1.7 IfX-y, X2, ■ ■ ■ are IID random variables with mean 0, variance 
a 2 and finite Rx = R, then U n = (Xi + . . . X n ) / v 'no 2 has the property that: 

Ew{U n ) -> Ew(Z), 

where Z is standard normal, for any continuous w such that w(t) < expc\t\, 
where c < c = l/(12-/ff). 

Proof Given a random variable U, define U T ~ U + Z T . Now, since Rjjt ^ 
Ru n + r < i? + r, by Lemma |T~5|.4, taking r such that c = l/(12\/i? + To): 

Eexp(c|f4 r |)<2, 

for all n, if r < r , c < c . 

Now, since have uniformly bounded Fisher information I(U^) < I{Z T ) = 
1/r, and Rut < R + r, Theorem [TB] implies that Kw(U^) — > Ew(Z T ). 

Hence, since: 

\Ew(U n ) -Ew(Z)\ 
< \Ew(U^) -Ew(U n ) \ + \Ew(U^) -Ew(Z T )\ + |Ew(Z) - Ew(Z T ) |, 

we need only show that given e, |Ew(f/ T ) — Ew(U)\ < e for r small enough. 
This follows by uniform integrability arguments (see Theorem 25.12 of Billings- 
ley), since E|w(?7 r )| p < Eexpcp|C/ T | < 2, for some small p, and since w(U T ) 
converges weakly to w(U). □ 
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2 Finiteness of R for mixtures of normals 

Proof of Theorem |1.4| Without loss of generality, consider X taking a 



finite number of values a\ > a 2 > ■ ■ ■ > a n with probabilities pi,P2, ■ ■ - Pn 
respectively, where EX = J2iPi a i = 0- 

We introduce the 'squared span' M = max(|a 2 — af|, |a| — a||, . . . |a^_ 1 — 
a 2 |, (ai - a 2 ) 2 , (a 2 - a 3 ) 2 , . . . (a n -i - a n ) 2 ), and write p for min s p s . 

By Theorem 1 of Borovkov and Utev, we need to check that for some R and 
all x, the density f T of Y satisfies: 

POO 

/ yf T (y)dy<Rf T (x). (2) 

J X 

Since f T (y) = Y^iPi&Av ~ a i)i the LHS of Equation (Q) becomes (defining 

U j = Yll=lPi a i and a n+l = -OO): 



poo n poo n POO 

/ Vfr(y)dy = ^Pi / (y - di)(f) T (y - a { )dy + ^2piCi t / 4> T (y 
Jx - =1 i=1 A 

n n poo 

= ^2piT<f) T {x - di)dy + J^pa / <j) T {y)dy 

i=l i=l ./ai-Oi 

n n PX—C 

= rf T {x) + ^2piai^2 I 

i=i j=i Jx ~ a i 



di)dy 



(f) T (y)dy 



j 

n-l 



" 1 I rx-a j+1 \ 

/ T (ac) / <Pr(y)dy , 



3=1 

since u n = 0, so for each interval Ij = (x — aj , x — Oj+i) we need to consider 
bounds on min^^ y 2 . 

We write r for the index such that a r < x < a r _i. 

First, we consider x > 0, where we can distinguish 3 cases: for j < r; 
x — Oj+i < 0, so for y G If 

y 2 > (x — cij + i) 2 = (x — dj) 2 + 2(cij — a,j + i)x + (a| +1 — a 2 ) = (x — a,) 2 — M. 

For j = r; y G /j means that: y 2 > > (x — dj) 2 — (oj_i— a,,) 2 > [x—a^) 2 — M. 
For j > r; x — > 0, so for y G J,: y 2 > (x — a,j) 2 . 
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Hence for all j, min ye j. y 2 > (x — aj) 2 — M, so: 

4> T (y)dy < (aj — a,- + i) max <p T (y) < (aj — aj + i)(f) T (x — aj) exp(M/2r). 



In Lemma 2A, we prove two technical results, that Uj(aj — aj + i) < a , and 



that Mp < 2cr . This allows us to deduce that for x > 0: 

n-l / -x— a.-,i \ n— 1 



/ px-a J+1 \ »-x 

^2 u j [ ^r(y)dy\ < exp(M/2r)^2uj(aj - a j+1 )(j) T (x - aj) 

. I V J X — ai J 

n-l 

< exp(M/2r)^ a 2 (p T (x — aj) 



3=1 
n-l 

< exp(a 2 /Tp)^2a 2 (pj/p)(f) T (x - aj), 

3=1 

as required. 

Similarly, for x < 0, we deduce that for y e /j, min y6 j. y 2 > (x — %+i) 2 — M, 
and thus: 

1-1 / pX-Uj+l \ n ~i 

J / <Pr(y)dy) < exp(M/2r)J2 

Uj(aj — a j+1 )<j) T (x — a j+1 ) 

j= i \Jx- aj J j=1 

< exp(a 2 /rp)^2a 2 (p j+1 /p)(j) T (x - a j+1 ). 

3=1 

□ 



Lemma 2.1 Using the notation above: Uj(aj — aj + i) < a 2 , and Mp < 2a 2 . 

Proof Note that Uj = Y^i=\Pi a i = ~ Y!i=j+iPi a i- 

For a j+1 > 0: Uj(aj - a j+1 ) < UjOj < Yli=iPi a i a j - Y^i=\Vi a \- 

For aj < 0: Uj(aj - a j+1 ) < -Uja j+1 < J27=j+iPi a i a 3+i < J27=j+iPi a l 

For a j+ i < < ay. Uj(aj - a j+1 ) < Y.'' i + Y!i=j+iPi a i a j+i - 

En o 
i= iPi a i- 
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For the second part, we consider two cases, firstly where M = a 2 s — a 2 s±1 . In 
this case: 

a 2 = ^Pta 2 > p s a 2 s = p s (a 2 s - a 2 s±1 ) + p s a 2 s±1 > pM. 
t 

Alternatively, if M — (a s — a s+ i) 2 then: 

° 2 = $>*a t 2 > p{a 2 s + a 2 s+1 ) > (p/2)(a s - a s+1 ) 2 > pM/2. 
t 

□ 

3 Convergence of the Poincare constant 

We establish an explicit rate of convergence of the Poincare constant, using 
projection inequalities similar to those in Johnson and Barron (2002) 

Lemma 3.1 Given independent random variables X,Y with Poincare con- 
stants Rx, Ry , for any function g: 

Var g(X + Y) < (R x + R Y )Eg'(X + Y) 2 - ^-^-^ Var g\X + Y), 
and hence Rx+y < Rx + Ry- 

Proof Without loss of generality, we can consider g such that Kg(X+Y) = 0, 
and define h(u) = W* Y g{u + Y), which thus also has mean zero. Now: 

Var#(X + r) = Eg 2 (X + Y) 

= E x {Eg 2 (X + Y)\X) 

= E x V^(g(X + Y)\X) + E x (Eg(X + Y)\X) 2 

< R Y Ex(E Y g'(X + Y) 2 \X)+Eh(X) 2 

< R Y Eg' 2 (X + Y) + R x Eh'(X) 2 . 

To consider the second term, we use the score function p Y and define: 
f(x)=E Y [(g'(x + Y)-h'(x))p Y (Y)\, 
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where by the Stein equation, f(x) = — Ey(/'(x + Y) — —h"(x). Further, by 
Cauchy- S chwarz : 

Eh"(X) 2 = Ef(X) 2 < I(Y)E(g'(X + Y) - h'(X)) 2 , 

so that: 

Var h'(X) < R x Eh"(X) 2 < R X I(Y) (Eg'(X + Y) 2 - Eh'(X) 2 ) , 
and writing fi = Eh'(X) = Eg'{X + Y), we obtain: 

Eti(X) 2 {l + R X I(Y)) < R x I(Y)Eg'(X + Y) 2 + /i 2 , 
which, rearranging, leads to: 



E/i'(V) 2 < E#'(X + Y) 2 - 



RxKY) + 1 



□ 



Next we need a Lemma which again uses the idea that if g' is nearly con- 
stant, then g is close to linear. We'd like to apply it to the optimal g, which 



achieves the maximum in Definition 1.1. However, rather than use compact 



ness arguments to show such a function exists, we can instead use a 'good' g 
instead. 

Lemma 3.2 For any random variable W with mean zero, and any function 
g such that R(t) = Var (g(W) + tW)/E(g'(W) +t) 2 has a local maximum at 



Eg'(W) 2 J ~ V Eg'(W) 2 ' 

Proof Without loss assume that Eg(W) = 0, and write \i = Eg'(W) and 
5 g = Var g'(W)/Eg'(W) 2 = 1 - ^/Eg'{W) 2 implies: 

Eg 2 (W) 

= E(g(W) - nW) 2 + 2fjEW{g(W) - /jW) + fi 2 EW 2 

< E(g(W) - fiW) 2 + 2\fi\ VVar (W)^E(g(W) - fiW) 2 + /i 2 Var (W) 



< (Eg'(W) 2 ) [R w \6 g + 2^5 g (l - 5 g ) ) + (1 - <y Var (W) 

< (Eg'(W) 2 ) (m w ^/5~ g + Var (W) 
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since E{g{W)-^W) 2 < R w E(g'(W)-fi) 2 = R w Var (g'(W)) < R w 5 g Eg\W) 2 , 
and since by Lemma [I7|.3, Var (W) < Rw- D 



Note, we can come up with tighter bounds: for example taking h(W) = W 
in Equation ( TO, EWg ( W) = R(0)fi. Hen ce fi( R(0) - Var (W)) = E(g (W) - 
yW)W < VVar (W)y/E(g(W) - fiW) 2 < ^/R w Var (W)Var g'(W). This 
implies that: 



R(0) - Var (W) < R 



w t 



s 9 



1-S g 



However, Lemma |372| is sufficient for our purposes. 

Proof of Theorem O) We consider convergence along the 'powers of 2' 
subsequence S k = U 2 k, which implies convergence for the whole sequence by 
subadditivity. 

For all k, 1 < R Sk < R, and I Sk < I. Taking X = S k /^2 and Y = S' k /V2 (an 
identical copy) in Lemma |5TT] implies (since R Sk /^/2 = Rs k /2 an d I(S k /\^2) = 
21 (Sk)) that for any g: 

Varg'(S k+1 ) rfv^fn ^ar g(S k+1 ) \ 

Now, given W = S k+ \, we can find h such that Var h/Eh' 2 > max(R — 
e, Var (W)). Since Var (h(W) + tW)/E(h' + 1) 2 tends to Var (W) at ±oo, 
and has one maximum to and one minimum, we can find g(W) = h(W)+toW, 



which satisfies the conditions of Lemma 3.2 



((Rs k+1 - e - 1) + ) 2 < ( ^y^J - I)' < 9Rl k J g < C(R Sk -R Sk+1+ e), 

where x+ = max(x, 0) and C = 18R 2 Sk+i (l/ R Sk + I(S k )) < 18R(IR+ 1). 
That is, since e is arbitrary, 

(R Sk+1 -l) 2 <C(R Sk+1 -R Sk ). (3) 

Note that since Rs k is decreasing and bounded below, successive differences 
tend to zero, and thus Rs k — > 1. 

To obtain a rate, write u k = (Rs k — Equation (|3|) gives u k (l + u k ) < 

itfc-i. Since u k are decreasing: . . . m^ +2 < w^+i — u n — u n-i ~u n , and hence: 
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u\ < - u n , u\ < m„_ 2 - itti-1, < m„_ 3 - M n _ 2 and so on. Summing, 
we obtain that for m < n: (n — m)v? n < u m — u n < u m . Taking n = 2 r , 
m = 2 r_1 implies that: 

/U 2 r-1 

Repeating this times, we deduce that (since u k < u± < 1): 

U2r < 2 -Er=i(r-i)/^ = 2 -r+2 2 (r-2-iV)/2^ j 

so if = r — 2, then < 4/2 r , and we can 'fill in the gaps' by subadditivity, 
to show that u k < 16/k for all k. □ 
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